Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

Targeted Gene Metagenomic Data Analysis ◾ 301

amplicon-based metagenomic analysis because it can identify bacterial species. A region

or regions of the gene are amplified using PCR and the amplicon then is sequenced with

the high-throughput technologies. The reads are usually for the targeted gene but for sev-

eral species. The analysis is then focused on identifying the taxonomic groups and their

abundance in the sample. After quality control, features unique sequences representing

taxonomic groups are obtained either by clustering or denoising. There are three kinds of

clustering: de novo clustering, open-reference clustering, and closed-reference clustering.

Any of these clustering methods will generate OTUs or operational taxonomic units. On

the other hand, denoising attempts to remove base call errors and classification error and

it produces ASVs, which are unique features that represent species in the sample. There

are three common algorithms for denoising: DADA2, Deblur, and UNOISE2. The most

commonly used program for amplicon-based metagenomic data analysis is QIIME2,

which implements both clustering methods and denoising methods. To analyze data with

QIMME2, raw data must be imported into QIIME2 artifacts. Several analyses can be con-

ducted with QIIME2 including taxonomic group identification and abundance, phyloge-

netic analysis, and diversity analysis.

REFERENCES

1. Coughlan L, Cotter P, Hill C, Alvarez-Ordóñez A: Biotechnological applications of functional

metagenomics in the food and pharmaceutical industries. Front Microbiol 2015, 6.

2. Schwartsmann G, Brondani da Rocha A, Berlinck RG, Jimeno J: Marine organisms as a source

of new anticancer agents. Lancet Oncol 2001, 2(4):221–225.

3. Xiong ZQ, Wang JF, Hao YY, Wang Y: Recent advances in the discovery and development of

marine microbial natural products. Mar Drugs 2013, 11(3):700–717.

4. Sun Z, Li J, Dai Y, Wang W, Shi R, Wang Z, Ding P, Lu Q, Jiang H, Pei W et al: Indigo Naturalis

Alleviates Dextran Sulfate Sodium-Induced Colitis in Rats via Altering Gut Microbiota. Front

Microbiol 2020, 11: 731.

5. Blaxter M, Mann J, Chapman T, Thomas F, Whitton C, Floyd R, Abebe E: Defining opera-

tional taxonomic units using DNA barcode data. Philos Trans R Soc Lond B Biol Sci 2005,

360(1462):1935–1943.

6. Westcott SL, Schloss PD: De novo clustering methods outperform reference-based meth-

ods for assigning 16S rRNA gene sequences to operational taxonomic units. PeerJ 2015,

3:e1487.

7. Rideout JR, He Y, Navas-Molina JA, Walters WA, Ursell LK, Gibbons SM, Chase J, McDonald

D, Gonzalez A, Robbins-Pianka A et al: Subsampled open-reference clustering creates

consistent, comprehensive OTU definitions and scales to billions of sequences. PeerJ 2014,

2:e545.

8. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP: DADA2:

High-resolution sample inference from Illumina amplicon data. Nat Methods 2016,

13(7):581–583.

9. Nearing JT, Douglas GM, Comeau AM, Langille MGI: Denoising the Denoisers: an

independent evaluation of microbiome sequence error-correction approaches. PeerJ 2018,

6:e5364.

10. Edgar RC: UNOISE2: improved error-correction for Illumina 16S and ITS amplicon

sequencing. bioRxiv 2016:081257.

11. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol

Biol 1990, 215(3):403–410.